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Consider Pillai’s trace l /( r) statistic (denoted as V hereafter) [Pillai, 1955] for a multivariate 5 
analysis of variance (MANOVA) problem with an independent variable x nx i and multiple de¬ 
pendent variables Y nX k, where Y is column-full-ranked. Let us reverse the problem as a linear 
multiple regression 


x = al + Yb + e. 


( 1 ) 


Let b = (Y t Y) 1 Y t x be the least-squares estimate of b. Define a score s nx i = Yb as a linear 
combination of the variables in Y, and fit a simple regression model 
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THEOREM 1. Let V be Pillai’s trace statistic for MANOVA ofY on x, and (3 be the least- 
squares estimate of f3, then V = f3. 


Proof of Theorem 1. Defining A = (1, Y), we have 



Then the inverse of A T A is 



where F n = {n - 1 T Y {Y t Y) 1 Y T 1}~ 1 and F 22 = (Y t Y - Y T ll T Y/n) \ 
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Thus 



and 


s = Yb = -~YF 22 Y t 11 t x + Y F 22 Y t x. 


n 


Let B = (l, 2 ;), so that 



det(-B T L>) = nx T x — x T ll T x, 


(3) 


2 


(. b t b ) 1 b t = 


1 


x T xl T — x T lx T 

T„iT , 


det ( B T B) V -1 J xl 1 + 

Therefore 

P = (0,1) (B t B)~ 1 B t s 

= det(grg) + YF v YT z) ■ 

By definition, Pillai’s trace V = tr{(T — E ) T 1 }, where 

E = Y t {/ - B {B T By 1 H T } y, 

1 

\ n 

Hence 


T = Y T [ I- - 11 T | y. 


r-£ = y T 1 H T -in T |y. 


Notice in (4), T 1 = F 22 , so combining with (3) and (5), we get 
V = tr{(T-£7)T -1 } 

y t (b (B t B)~ 1 B t - —11^^) YF 22 


= tr 


= tr 


= tr 


n ) 

B {B T By 1 B t - ill 7 ) YF 22 Y t 
1 


n J 


(\x T xi T — n T xx T ) —n T ^ YF ^ yT } 


det (B T B) 

+ 11 { det (B T B) (~ xlTxlT + nxxT ) yF 2 2 y T | 


= tr 


,L , (lx T xl T - i 1 det (B t B) 1 t - ll T xx T ) YF 2 2 Y T 
det (B 1 B) \ n y ' 1 


+ ,r { det(^B) ( _llTxlT + »“ T ) YF * YT ] 


-tr 


1 


11 xl 1 xl 1 - 11 xx‘ ] YF 22 Y 1 
J 

T„,iT , _T\vnwT„ 


det (B T B) \\n 

+ det (B T B) tr {(~ lTxlT + nxT ) YF ^ yTx ) 

‘ det(^g) ‘ r { h TxlT - nxT ) i^y T ll x x} 
1 

det (B T B) 

1 

det {B T B) 

= P 


+ 


tr {(—l T xl T + nx T ) YF 22 Y T x } 

(—l T .xl T + nx T ) ^-iyF 2 2 ^ T H T x + YF 22 Y T x^ 


(4) 

(5) 


□ 
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Given the numerical equivalence of V and f3, (3 has the same distribution as V. Note that the 
correlation coefficient between the fitted values and original response of regression (1) is the 
same as that of regression (2). Assuming each column of Y follows a Gaussian distribution, let 
R 2 be the coefficient of determination of regressions (1) and (2), then in regression (2), we have 


R 2 = pCov(x,s) 


V(s) 


Since x = al + s + e and Cov(s , al + e) = 0, together with Theorem (1), 


30 


(3 = V = R 2 


The F-statistic of the multiple regression (1) can be expressed as a function of R 2 , i.e. 

F - R2 / k 

(1 — R 2 )/(n — fc — 1) 

which is the same formula for the F-statistic of Pillai’s trace V [Pillai, I960]. After rearranging. 


/3 = v = R 2 = 


kF 


(n — k — 1) + kF 


As F ~ F(k, n — k — 1), we have 

P = V = R 2 ~ Beta 


k n — k — 1 
2 ’ 2 


which is the exact distribution of (3. In practice, the standard error of (3 can be obtained by 
Gaussian approximation of the Beta distribution, which simplifies the significance test of (3 as a 
Wald test. 

(3 is therefore a simple linear regression effect for a multivariate analysis. This single effect 
(3 is particularly useful in multivariate biological studies, so that the biomarker effect can be to 
interpreted and replicated with meaning, i.e. the effect on the score s. 
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